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Abstract — This paper develops the capacity of sampled analog 
channels under a sub-Nyquist sampling rate constraint. The 
analog channel is assumed to be a linear time invariant additive 
Gaussian channel, where perfect channel knowledge is available 
at both the transmitter and the receiver. We consider a general 
class of time-preserving sampling methods which includes irreg- 
ular nonuniform sampling. Our results indicate that the optimal 
sampling structures extract out the set of frequencies that exhibits 
the highest signal-to-noise ratio among all spectral sets of measure 
equal to the sampling rate, and hence suppress aliasing. The 
capacity under sub-Nyquist sampling can be attained through 
fliterbank sampling with uniform sampling at each branch 
with possibly different rates, or through a single branch of 
modulation and filtering followed by uniform sampling. These 
results indicate that for a large class of channels, employing 
irregular nonuniform sampling sets, while typically complicated 
to realize, does not provide capacity gain over uniform sampling 
sets with appropriate preprocessing. In addition, we demonstrate 
that aliasing or scrambling of spectral components does not 
provide capacity gain, which is in contrast to the benefits obtained 
from random mixing in spectrum-blind compressive sampling 
schemes. 

Index Terms — nonuniform sampling, sampled analog channels, 
sub-Nyquist sampling, channel capacity 

I. Introduction 

The study of capacity of analog Gaussian channels and 
capacity-achieving transmission strategies was pioneered by 
Shannon p]. These results have provided fundamental insights 
for modern communication system design. Shannon's work 
focused on capacity of analog channels sampled at or above 
twice the channel bandwidth. However, these results do not 
explicitly account for sub-Nyquist sampling rate constraints 
that may be imposed by hardware limitations. This motivates 
the exploration of the effects of sub-Nyquist sampling upon the 
capacity of an analog Gaussian channel, and the fundamental 
capacity limits that result when considering general sampling 
methods that include irregular nonuniform sampling. 

A. Related Work and Motivation 

Shannon introduced and derived the information theoretic 
metric of channel capacity for time-invariant analog waveform 
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channels ||2], which established the optimality of water- filling 
power allocation based on signal-to-noise ratio (SNR) over the 
spectral domain |j3J, |j4J. A key idea in determining the analog 
channel capacity is to convert the continuous-time channel 
into a set of parallel discrete-time channels based on the 
Shannon-Nyquist sampling theorem ||6). This paradigm 
was employed, for example, by Medard et. al. to bound the 
maximum mutual information in time- varying channels fl], 
1^, and was used by Forney et. al. to investigate coding 
and modulation for Gaussian channels ||9). Most of these 
results focus on the analog channel capacity commensurate 
with uniform sampling at or above the Nyquist rate associated 
with the channel bandwidth. There is another line of work 
that characterizes the effects upon information rates of over- 
sampling with quantization pO| , |Tl |. In practice, however, 
hardware and power limitations may preclude sampling at the 
Nyquist rate for a wideband communication system. 

More general irregular sampling methods beyond point- 
wise uniform sampling have been extensively studied in the 
sampling literature, e.g. fT2)-p4). One example is sampling 
on non-periodic quasicystal sets, which has been shown to 
be stable for bandlimited signals p3) , |16|. These sampling 
approaches are of interest in some realistic situations where 
signals are only sampled at a nonuniformly spaced sampling 
set due to constraints imposed by data acquisition devices. 
Many sophisticated reconstruction algorithms have been de- 
veloped for the class of bandlimited signals or, more generally, 
the class of shift-invariant signals p2) , | |T7) , pS) . For all these 
nonuniform sampling methods, the Nyquist sampling rate is 
necessary for perfect recovery of bandlimited signals [12J, 
Og, ||20). 

When we go beyond bandlimited signals, however, the 
Nyquist sampling rate can be excessive when certain signal 
structures are properly exploited | [2T| , | [22) . For example, 
consider multiband signals, whose spectral contents reside 
within several subbands over a wide spectrum. If the spectral 
support is known, then the necessary sampling rate for the 
multiband signals is their spectral occupancy, termed the 
Landau rate |23|. Such signals admit perfect recovery when 
sampled at rates approaching the Landau rate, provided that the 
sampling sets are appropriately chosen (e.g. p4| , p5|). One 
type of sampling mechanism that can reconstruct multiband 
signals sampled at the Landau rate is a filter bank followed 
by sampling, studied in |26|-|28|. Inspired by recent "com- 
pressive sensing" |29|-|31| ideas, spectrum-blind sub-Nyquist 
sampling for multiband signals with random modulation has 
been developed |32j. Sub-Nyquist nonuniform sampling for 
signal reconstruction was also proposed for signals with finite 
rate of innovation 11331, 1341 . 
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Although sub-Nyquist nonuniform sampHng methods have 
been extensively explored in the sampling literature, they are 
typically investigated either under a noiseless setting, or based 
on statistical reconstruction measures (e.g. mean squared error 
(MSE)) instead of information theoretic measures. Gastpar 
et. al. |35| explored the necessary sampling density for 
nonuruform sampling, and recent work by Wu and Verdu 
||36l, f37l investigated the tradeoff between the number of 
samples and the reconstruction fidelity through information 
theoretic measures. However, these work did not explicitly 
consider the capacity metric for an analog channel. The most 
relevant capacity result was by Berger et. al. | ,38J , who related 
MSE-based optimal sampling with capacity for several special 
types of channels. But they did not derive the sub-Nyquist 
sampled channel capacity for more general channels, nor did 
they consider nonuniformly spaced sampling. Our recent work 
||39l established a new framework that characterizes sampled 
capacity for a broad class of sampling methods, including 
filter and modulation bank sampling p2) , pOj , |41|. For 
these sampling methods, we determined optimal sampling 
structures based on capacity as a metric, illuminated intriguing 
connections between MIMO channel capacity and capacity of 
undersampled channels, as well as a new connection between 
capacity and MSE. However, this prior work did not investi- 
gate analog channel capacity using more general nonuniform 
sampling under a sub-Nyquist sampling rate constraint. 

One interesting fact discovered in |39| is the non- 
mono tonicity of capacity with sampling rate under filter- and 
modulation-bank sampling, assuming an equal sampling rate 
per branch for a given number of branches. This indicates 
that more sophisticated sampling techniques, adaptive to the 
channel response and the sampling rate, are needed to max- 
imize capacity under sub-Nyquist rate constraints, including 
both uniform and nonuniform sampling. However, none of 
the aforementioned works have investigated the question as to 
which sampling method can best exploit the channel structure, 
thereby maximizing sampled capacity under a given sampling 
rate constraint. Although several classes of sampling methods 
were shown in p9| to have closed-form capacity solutions, 
the capacity limits might not even exist for general sampling 
methods. This raises the question as to whether there exists 
a capacity upper bound over a general class of sub-Nyquist 
sampling systems beyond the classes we discussed before and, 
if so, when the bound is achievable. That is the question we 
investigate herein. 

B. Contributions and Organization 

Our main contribution is to derive the maximum capacity 
of sub-Nyquist sampled analog channels achievable by a gen- 
eral class of time-preserving nonuniform sampling methods, 
under a sub-Nyquist sampling rate constraint. The channel is 
assumed to be a linear time invariant (LTI) additive Gaussian 
channel, where perfect channel knowledge is available at both 
the transmitter and the receiver. Besides, the class of sam- 
pling systems we consider subsumes sampling with irregular 
nonuniform sampling grids. 

We first develop in Theorem [2] an upper bound on the 
sampled channel capacity, which corresponds to the capacity 
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fs,Ts 



Lebesgue measure 

sampling set {t„ : n G Z} 

upper, lower, and uniform Beurling density of A 

set of measurable functions / supported on the set 

f2 such that J |/|^ < 00 and 

set of positive semidefinite matrices 

impulse response, and frequency response of the 

LTI analog channel 

impulse response, and frequency response of the 

ith (post-modulation) filter 

impulse response, and frequency response of the 

pre-modulation filter 

power spectral density of the noise r?(t) 

aggregate sampling rate, and the corresponding 

sampling interval (T^ = 1/fs) 

impulse response of the sampling system, i.e. the 

output seen at time t due to an impulse in the 

input at time t. 

period of the modulating sequence q{t) such that 
T, = l/h 

set of integers, and set of real numbers 



of a channel whose spectral occupancy is no larger than the 
sampling rate f^. As a key step in the analysis framework 
for Theorem |2] we characterize in closed form the sampled 
channel capacity for any specific periodic sampling system 
formally defined in Definition [8] We demonstrate that this 
fundamental capacity limit can be achieved by filterbank 
sampling with varied sampling rates at different branches, 
or by a single branch of modulation and filtering followed 
by a uniform sampling set (Theorems 3]|4i. In particular, the 
optimal sampler extracts out a spectral set of size fs with the 
highest SNR, and suppresses all signal and noise components 
outside this spectral set. 

Our results indicate that irregular nonuniform sampling 
sets, while typically complicated to realize in hardware, do 
not outperform analog preprocessing with regular uniform 
sampling sets in maximizing capacity. We also show that when 
optimal filterbank or modulation sampling is employed, a mild 
perturbation of the optimal sampling grid does not change the 
capacity. Finally, we demonstrate that aliasing or scrambling 
of spectral contents does not provide capacity gain. This is 
in contrast to the benefits obtained from random mixing of 
frequency components in many sub-Nyquist sampling schemes 
with unknown signal support (e.g. ||32)). 

The remainder of this paper is organized as follows. In 
Section |n] we introduce our system models of sampled analog 
channels, and provide formal definitions of time-preserving 
systems, sampling rates, and sampled channel capacity. We 
then develop, in Section [Til- A[ an upper bound on the sampled 
channel capacity ranging over all time-preserving sampling 
methods, along with an approximate analysis highlighting 
insights of the result. The achievability of this upper bound is 



derived in Section III-B The key steps underlying the proof 
of Theorem |2] are sketched in Section |V] with more details 
deferred to the appendices. The implications of our main 
results are summarized in Section |lVl Notation is summarized 
in Table H 
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II. Sampled Channel Capacity 

A. System Model 

In this paper, we consider an analog waveform channel, 
which is modeled as an LTI filter with impulse response h{t) 
and frequency response H{f) ~ h{t) cxp(— j27r/t)di. 
With x{t) denoting the transmitted signal, the analog channel 
output is given by 



Preprocessor 



r{t) = h{t)*x{t)+'i]{t), 



(1) 



where the noise process ri{t) is assumed to be an additive sta- 
tionary zero-mean Gaussian process with power spectral den- 
sity Sri (/). Unless otherwise specified, we assume throughout 
that perfect channel state information is known at both the 
transmitter and the receiver 

The analog channel output is passed through M (1 < 7\/ < 
oo) branches of linear preprocessing systems, each followed by 
a pointwise sampler, as illustrated in Fig. [T] The preprocessed 
output Ui (t) at the ith branch is obtained by applying a linear 
bounded operator Ti to the channel output r{t): 

yr{t)=%{r{t)). 

Note that the linear operator can be time-varying, and sub- 
sumes filtering and modulation as special cases. We denote 
by q{t: t) the impulse response of the time-varying system, 
i.e. the output seen at time t due to an impulse in the input at 
time T. For example, a modulation system T {x{t)) ~ p{t)x{t) 
for some given modulation sequence p{t) has an impulse 
response q{t,T) = p{t)6 {t — t). A cascade combination of 
two systems 7i and 72 has an impulse response q{t, t) — 
92(i,Ti)gi(Ti,T)dTi, with and g2(-,-) denoting 

respectively the impulse responses of 7i and T2 | |42| . When 
T is LTI, we use ^(t) = q{t, t — r) as shorthand to denote its 
impulse response. 

The pointwise sampler following the preprocessor can be 
uniform or irregular 1 12|. Specifically, the preprocessed output 
yi{t) (at the i\h branch) is sampled at times „ [n € Z), 
yielding a sample sequence yi[n] = yi iti^n)- Here, we define 
the sampling set K-i at the ith branch as 



A,, 



{U^n Inez}. 



In particular, if ii.„ = nTi,s, then the sampling set at the ith 
branch is said to be uniform with period 3. 

B. Sampling Rate Definition 

Our metric of interest is the sampled channel capacity under 
a sampling rate constraint. We first formally define sampling 
rate for general nonuniform sampling mechanisms. 

In general, the sampling set A — {f„ | n G Z} may be 
irregular and hence aperiodic, which calls for a generalized 
definition of sampling rate. One notion commonly used in 
sampling theory is the Beurling density introduced by Beurling 
||T9| and Landau p3] , as defined below |12|. 

Definition 1 (Beurling Density). For a sampling set A ~ 
{tk I k G Z}, the upper and lower Beurling density are given 



Analog 

Channel . 



X{t) 



He 



r{t) 



ri(-) 



yi{t)^yM] = yi(ti,„) 



%(■) 



Vk[n] = Vk{tk,n) 



Tm(-) 



Figure L The input x{t) is passed througli the analog channel and 
contaminated by noise r){t). The analog channel output r(t) is then passed 
through M {1 < M < 00) linear preprocessing system {7i | 1 < i < A/}. 
At the ith branch, the preprocessed output yi(t) is sampled on the sampling 
set Ai = Inez}. 



respectively as 



D+ (A) = lim^ 
D- (A) = lim,. 



, inf 



card(An[z, z+r]) 

r ' 
card(An[z, z+r]) 



When (A) = D (A), the sampling set A is said to be of 
uniform Beurling density D (A) (A). 

When the sampling set is uniform with period Tg, the 
Beurling density is D{h.) — l/Tg, which coincides with our 
conventional definition of the sampling rate. The definition 
of Beurling density allows the Shannon-Nyquist sampling 
theorem to be extended to nonuniform sampling. Moreover, 
we will use Beurling density to define sampling rate for a 
large class of sampling mechanisms with preprocessing. 

Under nonuniform A, the set of exponential functions 
{exp (j27ri„/) | n g Z} forms a non-harmonic Fourier se- 
ries |fT3]. Whether the class of original signals are re- 
coverable from the nonuniform sampled sequence is de- 
termined by the completeness of the associated non- 
harmonic set. In particular, when A is uniform, the set 
I exp [j2TTtnf) I n e Z,tn = n/ fs} with D(A) = forms a 
Riesz basis f^sl of £^(-/s/2, /s/2) by the Shannon-Nyquist 
sampling theorem. For the class of sampling systems without 
preprocessing, a fundamental rate limit necessary for perfect 
reconstruction of bandlimited signals has been characterized 
by Landau using the definition of Beurling density as described 
in the following theorem. 

Theorem 1 (Landau Rate p3|). Consider the set Bn of 
signals whose spectral contents are supported on the fre- 
quency set ri and pointwise sampled with a sampling set 
A without preprocessing. All signals f{t) e Bn can be 
uniquely determined by their samples | i„ € A} only if 

(A) > 11 (B^), where fj, (•) denotes the Lebesgue measure. 
The value fi^Bn) is termed the Landau rate. 

Theorem [T] characterizes the fundamental sampling rate 
requirement for perfect signal reconstruction under pointwise 
nonuniform sampling without preprocessing. In particular, if 
O = [—B/2,B/2\, then Theorem [T| reduces to the Shannon- 
Nyquist theorem. Given the preprocessed output y{t), we can 
now use Beurling density to characterize the sampling rate on 
y{t). However, since the preprocessor might distort the time 
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scale of the input, the resulting "sampling rate" might not 
make physical sense, as illustrated in the following example. 

Example 1 (Compressor). Consider a preprocessing system 
defined by the relation 

with L > 2 being a positive integer. If we apply a uniform 
sampling set A = {t„ : tn = n/fg} on the preprocessed 
output y{t), the sampled sequence at a "sampling rate" fs is 
given by 

y[n] = y {n/ fs) = r {nL/fs) , 

which corresponds to sampling the system input r{t) at rate 
fs/L. The compressor effectively time-warps the signal, thus 
resulting in a mismatch of the time scales between the input 
and output. 

The compressor example illustrates that the notion of sam- 
pling rate may be misleading for systems that experience 
time warping. Hence, our results will focus on sampling that 
preserves time scales. A class of linear systems that preserves 
time scales are modulation operators {y{t) — p{t)x{t)), which 
perform pointwise scaling of the input, and hence do not 
change the time scale. Another class is the periodic system 
which includes LTI filtering, defined as follows. 

Definition 2 (Periodic System). A linear preprocessing sys- 
tem is said to be periodic with period Tq if its impulse response 
q{t, t) satisfies 

q[t,T)=q[t + Tq,T + Tg), Vt, r € R. (2) 

A more general class of systems that preserve the time scale 
can be generated through modulation and periodic subsystems. 
Specifically, we can define a general time-preserving system 
by connecting a set of modulation or periodic operators in 
parallel or in serial. This leads to the following definition. 

Definition 3 (Time-preserving System). Given an index set 
I, a preprocessing system T : x{t) h^. {^^(t), fc e 1} is said 
to be time-preserving if 

(1) The system input is passed through \I\ (possibly count- 
ably many) branches of linear preprocessors, yielding a set of 
analog outputs {yk{t) \ k € I}. 

(2) In each branch, the preprocessor comprises a set of 
periodic or modulation operators connected in serial. 

With a preprocessing system that preserves the time scale, 
we can now define the aggregate sampling rate through the 
Beurling density. 

Definition 4 (Sampling Rate for Time-preserving Systems). 

A sampling system is said to be time-preserving with sampling 
rate fs if 

(1) Its preprocessing system T is time-preserving. 

(2) The preprocessed output yk (t) is sampled by a sampling 
set Afe = {ti,k\l G Z} with a uniform Beurling density s, 
which satisfied X]fcei//c,s = fs- 

'We note that the sampling system may comprise countably many branches, 
each with non-zero sampling rate. For instance, if the kth branch is sampled 
at a rate fk s = fo, we have an aggregate rate fs = J^^^^i fo = 



We note that our class of time-preserving sampling does 
not preclude random sampling schemes. For example, the 
preprocessing system can be a random modulator and the 
sampling set can be randomly spaced. Our definition also 
includes multibranch sampling methods. In fact, each multi- 
branch sampling can be converted to an equivalent single 
branch sampling as follows. 

Fact 1. Suppose that a multibranch sampling system has 
sampling rate fs- Then there exists a single branch sampling 
system with sampling rate fs that yields the same set of 
sampled output values as the original system for any input. 

Proof: Suppose that the impulse response for the fcth 
branch is given by {t, t) with sampling set A^: ~ 
{tk^n I n G Z}- Without loss of generalit}!^] suppose that 
Afc n Afc/ = for any k ^ k' . By ordering all sample times 
in UfcgxAfc and renaming them to be {i; | Z e Z} such that 
ti < ti+i for all /, we can construct an equivalent single branch 
sampling system such that 

q{ii,T) = qk {tk,n,T) 

if ti corresponds to tk,n in the original sampling set. The 
sampling rate fg of the new system is given by fg ~ 

The samples obtained through this new single branch system 
preserve all information we can obtain from the samples of the 
original multibranch system. 

C. Capacity Definition 

There are two levels of capacity definitions that are of 
interest: (1) the sampled capacity for a given sampling system; 
(2) the maximum capacity over a large class of sampling 
systems under a sampling rate constraint. We now detail these 
definitions. 

Suppose that the transmit signal x{t) is constrained to the 
time interval [— T, T], and the received signal y{t) is sampled 
with sampling rate fs and observed over the time interval 
[—T,T]. For a given sampling system V that consists of a 
preprocessor T and a sampling set A, and for a given time 
duration T, the capacity C^{fs,P) is defined as 

C?(/„P) =max^/(x(hT,T]),{2/[y}[„^^^]) (3) 
subject to a power constrain £(2^ /^rl^lOP'^^) — 

^In fact, if Afc n Aj./ 7^ 0, then we can introduce a new shifted 
pair {qf.(t,T),Al) such that g*(t + (5,r) := qk{t,T) and A* = 
{t + (5 I t £ Afe} for some 5 such that A^. n A^ = 0, i.e. we can introduce 
certain delay to the preprocessed output and shift the sampling set correspond- 
ingly. Apparently, this new sampling structure leads to the same collection of 
sample values. 

^Note that for any T, if we incorporate the sampling system into the 
channel, then the sampled channel can be treated as a linear Gaussian channel 
with finite output. This channel can then be converted to a set of independent 
parallel discrete channels by a Karhunen-Loeve decomposition. The optimal 
strategy is then to send data separately through these parallel discrete channels, 
and to perform a water-filling power allocation strategy. 

'^For a given sampling system V, the sampling rate fs is fixed and no 
longer a variable. However, since the main objective of this paper focuses on 
the effect upon capacity of sub-Nyquist sampling rate, we include fs explicitly 
in the capacity notion. This will make the notation more easily understood 
when we investigate the maximum capacity over all sampling systems subject 
to the sampling rate constraint. 
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Here, {y[tn]}[_7n j^j denotes the set of samples obtained within 
time [-T,T], i.e'. {y[tn] | n e Z,i„ e [-r,r]}. 

Any sampled analog channel can be converted to a set of 
independent discrete channels via a Karhunen-Loeve decom- 
position. This capacity metric Q then quantifies the maximum 
mutual information between the input and output of these 
discrete channels, or equivalently, the maximum data rate that 
can be conveyed through these channels with asymptotically 
zero probability of error. 

The capacity of the undersampled channel under a given 
sampling system can be studied by taking the limit as T — > 
oo. It was shown in |39| that limT-s-oo (/s, P) exists for 
a broad class of sampling methods, including sampling via 
filter banks and sampling via periodic modulation. We caution, 
however, that the existence of the limit is not guaranteed for all 
sampling methods, e.g. the limit might not exist for irregular 
sampling. We therefore define the capacity and an upper bound 
for a given sampling system as follows. 

Definition 5. (1) C''^{fs,P) is said to be the capacity of a 
given sampled analog channel if limT-i.oo Ct ifs , P) exists 
and C^(/.,P) - limT^oo C|?(/„P); 

(2) {fs , P) is said to be a capacity upper bound of the 
sampled channel if C^{fs,P) > limsupj^^g^ Cj^{fs,P)- 

The above capacity is defined for a given sampling sys- 
tem. Another metric of interest is the maximum date rate 
for all sampling schemes within a general class of nonuni- 
form sampling systems. This motivates us to define the sub- 
Nyquist sampled channel capacity for the class of linear time- 
preserving systems as follows. 

Definition 6 (Sampled Capacity under Time-preserving 
Linear Sampling). (1) C{fs,P) is said to be the sampled 
capacity of an analog channel under time-preserving lin- 
ear sampling for a given sampling rate fs if C{fs,P) = 
supp C'P{fs,P); 

(2) Cti{fs,P) is said to be a capacity upper bound of 
the sampled channel under this sampling if Cu{fs,P) > 
supp lim sup-r^^ C'^{fs,P). 

Here, the supremum over V is over all time-preserving 
hnear sampling systems. 

The above definition of sampled capacity characterizes the 
fundamental capacity limit of an analog channel over a large 
set of different sampling mechanisms subject to a sampling 
rate constraint. This poses the problem of jointly optimizing 
the input and the sampling architecture to achieve capacity. In 
the next section, we will develop a tight capacity upper bound 
for the class of time-preserving sampling systems along with 
capacity-achieving sampling strategies. 

III. Main Results 

A. An Upper Bound on Sampled Channel Capacity 

1 ) The Converse: A time-preserving sampling system pre- 
serves the time scale of the signal, and hence does not com- 
press or expand the frequency response. We now determine 
an upper limit on the sampled channel capacity for this 
class of general nonuniform sampling systems. Fact [T] implies 



that any multibranch sampling system can be converted to a 
single branch sampling system without loss of information. 
Therefore, we restrict our analysis in this section to the class 
of single branch sampling systems, which provides exactly 
the same upper bound as the one accounting for multibranch 
systems. In addition, we constrain our attention to sampling 
methods that are right-invertible, as defined below. 

Definition 7 (Right-Invertible Sampling System). A sam- 
pling system with sampling set A and impulse response 
q{ti, t) (ti e A) is said to be right invertible if for any index 
set I = {^1, Z2, • ■ ■ Jn} for some integer N and its associated 
sampling subset 

Aj = {u h e 1} c A, 

the A'^-dimensional square matrix Wn is invertible. Here, Wn 
is the Gramian matrix associated with the kernel <?(•,•) and 
defined by 



qith,T)q*iti^,T)dT. 



The right-invertibility of the sampling system implies that 
each subset of the impulse response {q{ti,T) \ i £ 1} is a 
linearly independent family - none of the single sample 
can be perfectly determined by a set of other samples. For 
example, when pointwise uniform sampling is employed, each 
subset {q{ti, t) \ i E 1} = {(5(t — ti) \ i E 1} is a linearly 
independent family, and hence each additional sample contains 
innovative information. Our main theorem is now stated as 
follows. 

Theorem 2 (Converse). Consider any time-preserving right- 
invertible sampling system with sampling rate fg. Suppose that 
Sriif) 7^ for every f. Assume that there exists a small 

constant e > such that T^^ | f--^^ \ it) = O (tA^). 

Suppose that there exists a frequency set that satisfies 

A* (Pm) = fs and 



|g(/)r 



df 



sup 

B:t,{B)=f, 



\H{f)Y 



df, 



feB^ '-'vyj J B:ii(B)=f,J feB ^vif) 

where p(-) denotes the Lebesgue measure. Then the sampled 
channel capacity can be upper bounded by 

+ 



Cu{fs,P) 



1 



feB„ 



log V 



\H{fW 



df, (4) 



where [x]^ = max(a;,0) and v satisfies 



f 

■I feB„ 



\Hiff 



df^P 



(5) 



Remark 1. Note that Cu ( fs, P) is monotonically nondecreas- 
ing in fs and P. In fact, when the sampling rate is increased 
from fs to fs + 6, Cu (/«, P) corresponds to the optimal value 
when considering all spectral sets of support size fs + S. 
Since we are still allowed to employ (suboptimal) strategies to 
allocate power over smaller spectral sets with size fs, we are 
optimizing over a larger set of transmission / power allocation 
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Strategies than the situation with sampling rate fs. Therefore, 
Cu {fs,P) is nondecreasing in fg. 

In other words, the upper limit is equivalent to the maximum 
capacity of a channel whose spectral occupancy is no larger 
than fg. The above result basically implies that even if we 
allow for more complex irregular sampling sets, the sampled 
capacity cannot exceed the one commensurate with the analog 
capacity when constraining all transmit signals to the interval 
of bandwidth fs that experience the highest SNR. Accordingly, 
the optimal input distribution will lie in this frequency set. 
This theorem also indicates that the capacity is attained when 
aliasing is suppressed by the sampling structure, as will 
be seen later in our capacity-achieving scheme. When the 
optimal frequency interval is selected, a water filling power 
allocation strategy is performed over the spectral domain with 
water level v. 

2) Approximate Analysis: To allow the readers to get some 
intuition into the results, we provide an approximate (but non- 
rigorous) argument as follows. Two of the key ideas that 
enable the analysis are "noise whitening" and "orthonormal 
projection". 

Suppose that the Fourier transform of the analog channel 
output r{t) is given by H{f)X{f) + N{f), where X{f) 
and N{f) denote, respectively, the Fourier response of x{t) 
and r]{t). When the sampled sequence does not collapse 
information, we can characterize the sampling process through 
a linear injective mapping TZ from the space of linear func- 
tions H{f)X(f) + N{f) e £2(-oo,oo) onto the space 
'C2(— /s/2, /s/2) of bandlimited functions: 

<t){-) = n{HX) + n{N). 

This way the noise component TZ [N) can be treated as 
additive sampled noise in the frequency domain. We note, 
however, that this Gaussian noise TZ [N] is not necessarily 
independent over the spectral support [— /s/2, /s/2]. This 
motivates us to whiten it first without loss of information. 

Denote by W the whitening operator and suppose that 
TZ (N) is bounded away from 0. Then the prewhitening process 
is performed as 

W(t> (•) = W (7^ (HX)) +W{TZ {N)) 

such that the noise component 'W{TZ{N)) is independent 
across the frequency domain. If we denote iz{-) = W {TZ (•)), 
then we can rewrite the input-output relation as 

4>{-)=TZ{HX) + N, 

with N being white over [— /s/2,/s/2] and TZ being 
an orthonormal operator onto £2 (~/s/2, /s/2). That said, 
the operator TZ effectively projects all spectral components 
H{f)X{f) + N{f) onto a subspace C2i-fs/2Js/2). Instead 
of scrambling of spectral contents, the optimal projection that 
maximizes the SNR extracts out a spectral set of support 
size fs that contains the frequency components with highest 
SNR. This leads to the capacity upper bound (|4]). The approach 
we outline is illustrated in Fig. [2] and will be formally proved 
in Section IV] 



SNR(/) 





-/,/2 /./2 1.5/, 2.5/, -f,/2 /./2 

(a) (b) 

SNR(/) 




-fj2 f./2 



(C) 

Figure 2. Projection of spectral contents from £(—00, —00) onto 
£ (— /s/2, /s/2). (a) SNR of tlie analog cliannel. (b) optimal projection: 
it extracts out a frequency set of size fs and zeros out all other contents, 
(c) a projection that scrambles spectral contents, which does not in general 
maximize capacity. 



B. Achievability 

It turns out that for most scenarios of interest, the capacity 
upper bound given in Theorem [2] can be attained through 
filterbank sampling, as stated in the following corollary. 

Theorem 3 (Achievability - Sampling with a Filter Bank). 

Suppose that the maximizing frequency set i?,„ introduced in 
Theorem |2] exists and can be divided into 



Br, 



U.S. 



U D, 



where X is an index set, D contains a set of singular points, 
Bi is a continuous interval, and D and Bi (i G A") are non- 
overlapping sets. The upper bound Q can then be achieved 
by filterbank sampling via a countable number of filters. 
Specifically, in the kth branch, the frequency response of the 
filter is given by 



Skif) 



1, iffeBk, 

0, otherwise, 



(6) 



and the filter is followed by an ideal uniform sampler with 
sampling rate fi{Bk). 

Proof: See Appendix [B] ■ 
Since the bandwidth of Bi may be irrational and the 
system may require an infinite number of filters, the sampling 
system is of aperiodic impulse response in general. Theorem[3] 
indicates that filterbank sampling with varied sampling rates 
in different branches maximizes capacity. 

The optimality of filterbank sampling immediately leads 
to another optimal sampling structure under mild conditions. 
As we have shown in |39|, filterbank sampling with equal 
rate at different branches can be replaced by a single branch 
of modulation, as illustrated in Fig. [3] This approach attains 
the maximum capacity achievable by filterbank sampling if 
the SNRs of the analog channel are piecewise constant in 
frequency. Although the filterbank sampling we derive in (j6]) 
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77(0 q(t) 



x{t) 
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r(0 



P(t) 
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Figure 3. (a) Filterbank sampling: each branch filters out a frequency interval 
of bandwidth B^, and samples it with rate fk,s = Bk', (b) A single branch 
of modulation and filteiing: the channel output is prefiltered by a filter with 
impulse response p{t), modulated by a sequence q{t), post-filtered by another 
filter of impulse response s{t), and finally sampled by a uniform sampler at a 
rate fs. If the SNR \H{f)\'^ /S^{f) is piecewise flat, thenp(t), and s{t) 
can be chosen such that the two systems are equivalent in terms of sampled 
capacity. 



does not employ equal rate at different branches, for most 
channels of physical interest we can simply divide each branch 
further into a number of sub-branches to allow the rates 
at different branches to be reasonably close to each other. 
Therefore, for most channels of physical interest (say, the 
channels whose SNRs in frequency are Riemann-integrable), 
the capacity achievable through filterbank sampling can be 
approached arbitrarily closely by a single branch of sampling 
with modulation. This achievability result is formally stated in 
the following theorem. 

Theorem 4 (Achievability - A Single Branch of Sampling 
with Modulation and Filtering). Under the assumptions 
of Theorem^ suppose further that /iS^(/) remains 

constant within each set Bi. Then there exists a sampling 
method using a single branch of sampling with modulation 
and filtering that approaches the upper bound Q arbitrarily 
closely. 

Proof: See Appendix [C] ■ 
A channel of physical interest can often be approximated 
as piecewise constant over frequency in this way. Given the 
maximizing frequency set B^, we first suppress the frequency 
components outside i?n, using an optimal LTI prefilter A 
modulation module is then applied to scramble all frequency 
components within B^. The aliasing effect can be significantly 
mitigated by appropriate choices of modulation weights for 
different spectral subbands. We then employ another band-pass 
filter to suppress out-of-band signals, and sample the output 
using a pointwise uniform sampler. The detailed optimizing 
strategy can be found in Appendix [C] Compared with filter- 
bank sampling, a single branch of modulation and filtering 
only requires the design of a lowpass filter, a band-pass filter, 
and a multiplication module, which are typically of lower 
complexity to implement than a filter bank. 



IV. Discussion 

Some properties of the capacity and capacity-achieving 
strategies are now discussed. 

Monotonicity. It can be seen from Theorem |2] that increas- 
ing the sampling rate from fs to fs requires to crop out another 
frequency set B^ of support size fs that has the highest SNRs. 
By definition, the original frequency set B^ must be a subset 
of B^. Therefore, the sampled capacity with rate fg is no 
lower than the sampled capacity with rate fs- 

Irregular sampling set. Sampling with irregular nonuni- 
form sampling sets, while requiring complicated reconstruc- 
tion and interpolation techniques |12|, does not outperform 
filterbank or modulation bank sampling with regular uniform 
sampling sets in maximizing capacity for the channels consid- 
ered herein. 

Alias suppression. We have seen here that aliasing does 
not allow a higher capacity to be achieved when perfect 
channel state information is known at both the transmitter 
and the receiver. The optimal sampling method corresponds 
to the optimal alias-suppression strategy. This is in contrast 
to the benefits obtained through random mixing of spectral 
components in many sub-Nyquist sampling schemes with 
unknown signal supports. When we are allowed to jointly 
optimize over both input and sampling schemes with perfect 
channel state information, scrambling of spectral contents does 
not in general maximize capacity. 

Perturbation of the sampling set. If optimal filterbank or 
modulation sampling is employed, then mild perturbation of 
post-filtering uniform sampling sets does not degrade the sam- 
pled capacity. One surprisingly general example was given and 
proved by Kadec |44|. Suppose that a sampling rate fs is used 
in any branch and the sampling set satisfies t„ — n/ fs < 
/s/4. Then {exp (j27r£„/) | rt g Z} also forms a Riesz basis 
of Ci{—fsl'^Tfsl'^), thereby preserving information integrity. 
These nonuniform sampling and reconstruction schemes, while 
generally complicated to implement in practice, significantly 
broaden the class of sampling mechanisms that allow perfect 
reconstruction of bandlimited signals, and indicate stability 
and robustness of the sampling sets. Kadec's result immedi- 
ately implies that the sampled capacity is invariant under mild 
perturbation of the sampling sets. 

Hardware implementation. When the sampling rate is 
increased from fs\ to fs2, we need only to insert an additional 
filter bank of overall sampling rate fs2 — fsi to extract out 
another set of spectral components with bandwidth fs2 — /si- 
The adjustment of the sampling hardware system for filterbank 
sampling is incremental with no need to rebuild the whole 
system from scratch. 

Spectrum Blind Sampling. This paper focuses on the 
scenario with perfect channel state information known at the 
transmitter, the receiver, and the sampler. This is different 
from the setting of compressed sensing, where the signal 
spectrum is unknown to the sampler and the decoder. In 
fact, the alias-suppressing sampler requires knowledge of the 
channel. If this knowledge is not available, alias-suppressing 
samplers might result in fairly low capacity. When the sampler 
is spectrum blind and the channel realization is uncertain. 
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random sampling that scrambles the spectral contents p2) , 
l[34 1 outperforms alias-suppressing sampling in minimizing the 
rate loss due to channel-independent sampling design. This is 
investigated in our companion paper HSl. 



V. Proof of Theorem|2] 

The key steps underlying the proof of Theorem [2] are 
presented in this section. The proof consists of the following 
two steps. 

1) We start by analyzing the class of periodic sampling 
systems: a special type of sampling methods that allow 
closed-form capacity expressions. We then derive a 
closed-form upper bound over all periodic sampling 
systems. 

2) The upper bound is then derived by relating general sam- 
pled channels with periodic sampled channels through a 
finite -duration approximation. 

We rely on Fact [T] that each multibranch sampling system can 
be converted to a single branch sampling system without loss 
of information. Therefore, we restrict our proof to the class 
of single branch sampling systems, which provides the same 
upper bound as the one accounting for multibranch sampling 
systems. 

A. Periodic Sampling Systems 

Recall that for a time varying system, the impulse response 
q{t, t) is defined as the output seen at time t due to an impulse 
in the input at time r. The sampling system may not be time- 
invariant, but a broad class of sampling mechanisms applied in 
practice exhibit block-wise time invariance properties. Specif- 
ically, we introduce the notion of periodic sampling systems 
as follows. 

Definition 8 (Periodic Sampling). Consider a sampling sys- 
tem with a preprocessing system of impulse response q{t,T) 
followed by a sampling set A = {i^ | k e Z}. A linear 
sampling system is said to be periodic with period Tg and 
sampling rate fs {fsTq € Z) if the preprocessing system is 
periodic with period Tq and the sampling set satisfies 

tk+f,T,^tk+Tq, \fkeZ. (7) 



tt tt-t tt tt-t 



Figure 4. The sampling set of a periodic sampling system with period 1/fq 
and sampling rate fs- 

In short, a periodic sampling system consists of a periodic 
preprocessor followed by a pointwise sampler with a periodic 
sampling set, as illustrated in Fig. |4] Since the impulse 
response can be arbitrary within a period, this allows us to 
model multibranch sampling methods with each branch using 
the same sampling rate. Periodic sampling schemes subsume 
as special cases a broad class of sampling techniques, e.g. 



sampling via filter banks, sampling via periodic modulation, 
and recurrent nonuniform sampling p4| , | |46| . 

The periodicity of the sampling system renders the linear 
operator associated with the whole system to be block Toeplitz. 
The asymptotic spectral properties of block Toeplitz operators 
(e.g. |47 |) guarantee the existence of limx^oo Ct P) for 
a given periodic sampling system V, and allows a capacity 
expression to be obtained through the Fourier representation. 
Denote by Qk{f) the Fourier transform of the impulse re- 
sponse q{tk,tk ~ t) of the sampling system, i.e. Qk{f) 
/.r^9(^fe>^fe ~ t) exp(— j27r/t)dt. We further introduce an 
fsTq X oo dimensional Fourier series matrix Fq (/) associated 
with the sampling system, and another infinite diagonal square 
matrix Fh (/) associated with the channel response. For all 
m,l ^ Z and 1 < fc < fsTq, we set 



iFh)u if) ■■= 



Qk (/4 



If,), 
-If,) 



We can then express the sampled analog capacity for a given 
periodic system V in closed form as follows. 

Theorem 5. Suppose the sampling system is periodic 
with period Tq and sampling rate f^, where fgTq S 



Z. Let fq 



i/Tq. Assume that 7^ /or 



every f, \H{f)Qk{f)\ /Sr,{f) is bounded and satisfies 
iZo \H{f)Qk{f)? /S^U) <^foralll<k< f,Tq, and 
that FqF* is invertible. 

When perfect CSI is known at both the transmitter and the 
receiver, the sampled channel capacity with sampling rate fs 
is given by 



C{fs,P) 

where v satisfies 



/,/2 



/,/2 



A,; 



log [z^-A,]+d/, 



(8) 



Here, \x\^ denotes max(2;,0), and 

A, A, {(F,F;)-^ FqFnFlF\ (F^F*)'^} 

denotes the ith largest eigenvalue of a matrix 
{FqFl)-'^ FqFuFlFl [FqF^yK 

(2) Suppose that H(f) — Q for any f (jz. [0, W\ Assume that 
perfect CSI is known at both the transmitter and the receiver, 
and that the transmitter employs equal power allocation over 
[0, W]. Then the sampled channel capacity with sampling rate 
fs is given by 



^ non-wf i f s 1 P) — 



1 



log / 



P 

W 



■f^fif; [FqFiY 



df. 



(9) 



Proof: See Appendix [D| 



Remark 2. We note that a periodic sampling system can 
be equivalently converted to filterbank sampling with equal 
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rate at each branch. Then, the results derived in ^59| / im- 
mediately leads to Q. However, our results in ^39^ require 
\H{f)Qk{f)\^ /S,-i{f) to lie in the Ci. While the analysis 
framework in 1139'j makes use of a discretization argument and 
spectral properties of block-Toeplitz matrices, we provide in 
this paper a different proof with the same general approach 
but based on operator analysis. This allows us to extend our 
previous results to any channel with \H(f)Qk{f)\^ /Sj^if) 
lying in £2- 

In Theorem |5] is the water-level with respect to the opti- 
mal water-filling power allocation strategy over the eigenval- 
ues of the matrix (F^F*)"^ FqFh.FlF*^ {F^F^yK The 
capacity expression can be interpreted by relating the sampled 
channel to a frequency-selective MIMO Gaussian channel with 
a countable number of transmit branches and fgTq receive 
branches. The capacity-achieving transmission strategy is the 
one that can decouple the mutual interference and that can 
convert the MIMO channel into a set of fgTg parallel non- 
interfering Gaussian channels. This set of parallel channels 
are of respective channel gains equal to the eigenvalues of the 
associated system matrix, and hence water filling over these 
eigenvalues yields the maximum information rate. 

Similar to Gallager's approach in we provide here an 
argument that relates periodic sampling systems to sampling 
with a filt er b ank, and defer the complete rigorous proof to 
Appendix 



D 



Instead of directly studying the single branch 
nonuniform sampling set, we divide the sampling set into fgTg 
subsets, with the fcth (1 < fc < fsTq) sampling subset given 
by 



Az, 



k+nT„ 



ez}, 



which is uniformly spaced with sampling rate fq. From 
the periodicity of the sampling system, the whole system 
is equivalent to having fgTg branches of sampling with 
filtering, where the impulse response Sk{t) of the filter in 
the fcth branch is given by Sk{t) :— q{tk,tk — t). Due 
to aliasing, the frequency components over the aliased set 
{f + lfq\lE Z} among different branches will be coupled 
together, thus forming countably many transmit branches, 
whereas the fgTg sampling subsets correspond to fgTq receive 
branches. Using the approximate analysis provided in p9, 
Section IV-B] immediately leads to the capacity expression 
(jS} and the optimality of water-filling power allocation over 
the eigenvalues of (F^F^)"^ F^F^F^F^ {F^F^yK 

For the class of periodic systems with common sampling 
rate and period, the maximum capacity can be identified as 
follows. 

Corollary 1. Consider the setup and assumptions in Theorem 
|5] Under all periodic sampling systems with period Tq, the 
sampled channel capacity with sampling rate fs can be 

'This approximate analysis provides a more informative understanding by 
relating to the well-known MIMO capacity, but falls short of mathematical 
rigor, especially when it comes to the convergence of the Fourier transform. A 
rigorous approach makes use of the asymptotic properties in Toeplitz theory, 
which allows us to circumvent the mathematical issues when passing to the 
limit T — ^ 00. 



bounded above by 

Cu ifs.P) = \ f^'^ E [log (^P^^ {FhFl})y df, 

(10) 



where v satisfies 



/,/2 

-/,/2 S 



1 



df^p. 



This upper limit can be attained through sampling via a bank 
of fsTq filters, each fallowed by pointwise uniform sampling 
at rate fq. The frequency response of the kth (1 < k < fsTq) 
filter is given by 



0, otherwise. 



Skif-lf,) = 



Proof Following the same steps as in p9| Proposition 
1], we can see that the ith largest eigenvalue satisfies 



A. 



F,FuFlF*q 



< A. {F^FD 



which immediately leads to ( [T0| . ■ 
Corollary[T]indicates that the filter bank optimizing sampled 
capacity extracts out the fsTq frequencies with the highest 
SNR from the aliased set { / — //^ | ^ e Z}, and suppresses 
all other frequency components. For a given period Tq, no 
other periodic sampling mechanism can outperform filterbank 
sampling in maximizing capacity. We note, however, that this 
filter bank is optimal in terms of an information-theoretic 
metric of capacity without accounting for implementation 
complexity and energy consumption. Other periodic sampling 
mechanisms (e.g. a single branch of sampling with modulation 
and filtering), while achieving the same capacity, may provide 
implementation gain and energy efficiency. 

Corollary [T] characterizes a tight upper limit on the sampled 
capacity for the class of periodic sampling systems but only 
for a fixed period and sampling rate. If we are allowed to 
vary the period, then we can immediately get another loose 
upper bound that is independent of the period of the sampling 
mechanisms. This upper bound will assist us in developing a 
tight upper limit on sampled capacity for a general class of 
nonuniform sampUng techniques. 

Corollary 2. Suppose that there exists a frequency set 
that satisfies fi (Bm) — fs <^nd 



feB„ 



df 



sup 

B:p^(B)=f, 



df, 



where /i (•) denotes the Lebesgue measure. Under all periodic 
sampling systems with any given period Tp, the sampled 
channel capacity with sampling rate fg can be upper bounded 
by 



feB, 



log 



mf)\' 



-I + 



df, (11) 
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where Vu satisfies 



I 

J feB„ 



\Hiff 



Proof: For any given /„, the upper bound ( 10 1 is obtained 



by extracting out a certain frequency set B that has measure 
fi{B) = fs and suppressing all spectral components outside 
B. By our definition of B„i, any choice of B with spectral 
size fs will not outperform B.^. Hence, choosing B = Bm 
leads to a universal upper bound. ■ 
The upper bound of Corollary [2] is obtained by constraining 
the transmit signals to the frequency set of support size fs that 
possesses the highest SNR. This is exactly our upper bound in 
Theorem [2] In general, the exact bound is not achievable by 
any given periodic sampling system, but can be approached 
arbitrarily closely through a sequence of periodic systems. 
Encouragingly, when we go beyond periodic sampling sys- 
tems, the above bound is often achievable through some time- 
preserving sampling system, as shown in Theorem [3] 

B. General Upper Bound 



The next step is to verify whether (10 1 is an upper bound 
even when we consider all time-preserving sampling systems 
under a given sampling rate constraint. For each channel 
and each given sampling system, the impulse response is 
in general aperiodic. However, instead of studying the true 
sampled channel response directly, we can truncate the channel 
response so that its impulse response is nonzero only for a 
finite duration. The capacity bound for this truncated channel 
can be studied by looking at a new periodized channel we 
construct. As we show, the capacity of the truncated channel 
can be made arbitrarily close to the capacity of the true 
sampled channel. 

Recall that g{t) = (^H (f) / ./Sjf)^ The key steps 
underlying our proof are outlined below. 

1) Finite-duration g{t). Consider first channels for which 
g{t) is of finite duration, g{t) — 0, V|i| > Lq for some 
Lo > 0. 

a) Consider any given time-preserving sampling sys- 
tem V with impulse response q{t, r), and suppose 
that the input x{t) is time constrained to the 
interval [— T, T] . Construct a new periodic channel 
with period 2(T + Lq) based on q{t, t 
denote the capacity of the periodized channel 

b) Show that < holds uniformly for all V. 
Since we know that < Cu for any periodized 
channel (or, equivalently, any channel followed by 
a periodic sampling system), this proves the ca- 
pacity upper bound for this class of finite-duration 
channels. 

2) Infinite-duration g{t). We next extend the results to 
channels for which g{t) is of infinite duration. 

a) Construct a truncated channel such that 



Let 



if \t\ < Li, 
else, 



for some sufficiently large Li. The capacity upper 
bound holds for the truncated channel, as shown 
in Step 1). 

b) For any given samphng system V and any time 
interval [— T, T], compare the capacity of the orig- 
inal channel (denoted by Cj^) with the capacity 
of the truncated channel (denoted by C^), which 
can be done by investigating the spectrum of the 
operators associated with both sampled channels. 
It can be shown that can be upper bounded 
by (1 + S)C^ for arbitrarily small 6, which holds 
uniformly over all sampling systems V. Combining 
this with results shown in Step 1), we conclude 
that the capacity bound holds for the whole class 
of infinite-duration channels. 
The details of the proof are deferred to Appendix [A] 

VI. Concluding Remarks 

We developed the maximum achievable information rate 
for a general class of time-preserving nonuniform sampling 
methods under a sampling rate constraint. It is shown that 
the nonuniformly spaced sampling sets, while requiring fairly 
complicated reconstruction / approximation algorithms, do not 
provide any capacity gain. Encouragingly, filterbank sampling 
with varied sampling rate at different branches, or a single 
branch of sampling with modulation and filtering, are sufficient 
to achieve the sampled channel capacity, and both strategies 
suppress aliasing effects. In terms of maximizing capacity, 
there is no need to employ irregular sampling sets that are 
more complicated to implement in practical hardware systems. 
The resulting sampled capacity is shown to be monotonically 
increasing in sampling rate. 

Our results in this paper are based on the assumption that 
perfect channel state information is known at both the trans- 
mitter and the receiver It remains to be seen what sampling 
strategies can optimize information rates when only partial 
channel state information is known. It is unclear whether anti- 
aUasing methods are still optimal in maximizing capacity. 
Moreover, when it comes to the multi-user information theory 
setting, anti-aliasing methods might not outperform other 
spectral-mixing approaches in the entire capacity region. It 
would be interesting to see how to optimize the sampling 
schemes in multi-user channels, for example, the joint sam- 
pling schemes in sampled multiple access analog channels. 

Appendix A 

Proof of General Upper Bound for Theorem[2] 
We prove the theorem first for the special case where 
g(t) = (^H{f)/ ^/Sri{f)j is of finite duration, and then 
generalize the proof to the case in which g{t) is of infinite 
duration. 

A. Finite-duration g{t) 

In this subsection, we focus on the channel whose impulse 
response is of finite duration 2Lo, i.e. g{t) ^ only when 
|i| < io- For convenience of presentation, our proof is mainly 
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devoted to the channel in the presence of white additive 
noise. The result extends to colored-noise scenarios without 



difficulty, as we demonstrate in Appendix A-A2 



1} h{t) of finite duration and •q{t) white: Consider the 
situation where the noise 77 (t) is white and the channel 
response h{t) (and hence g{t)) is of finite duration 2Lo- Our 
goal is to prove that the capacity upper bound (j4| holds for 
this type of channel. 

For any transmission block [Tq — T, Tq + 7"]. we call the 
transmit signal x{t) over this block a codeword of code length 
2T. The information conveyed through these finite-duration 
codewords can be bounded via the true analog channel capac- 
ity, if we can preclude the effect of inter-symbol interference. 
The key here is to separate different codewords with a guard 
zone with sufficient length and then use capacity-achieving 
strategies separately for each codeword. When the code length 
2T is sufficiently large, the loss of transmission time due to 
guard zones becomes negligible, which in turn allows us to 
approach the true capacity arbitrarily well. 

Step 1. Consider an input x{t) that is constrained to the 
interval [— T, T]. Since h{t) is of finite duration 2Lq, the 
channel output r{t) ~ hit) * x{t) + rj{t) will be affected by 
the input only when t E [— T — Lq, T + Lq]. Define a window 
operator and its complement operator such that 



and 



WTifit)) 




fit), 



if \t\<T- 
else; 

if |t| < r- 

else. 



Ln 



Then for any linear sampling operator V with im- 
pulse response q{t,T), the sampled output is V {r{t)) = 
V {WT {r{t))) + V {w^ {rit))). One can easily observe that 
the component V {w^ (''(0)) contains no information about 
x{t), and is statistically independent of V {wt {fit))) in the 
presence of white noise. Therefore, it suffices to restrict our 
attention to the class of sampling systems whose system input 
is constrained to the interval [— T — Lq, T + Lq]. 

Step 2. Construct a periodization of the above sampled 
channel model with finite input duration. Set the impulse 
response (fx+Lo ^) °^ '■^^ preprocessor of the periodized 
sampling system to be a periodic extension of q{t, r) in the 
block [-T - Lo,T + Lo] x [-T,T]. Specifically, if t = 
fc-2(r + Lo)+T,- for some fc G Z and r,- e [-T-Lo,T+Lo], 
then 

(q{t^2k{T + Lo),T,), if \t-2k{T + Lo)\ 

QT+Loit,^) ^ \ <T + Lo, 

[0, else. 

(12) 

Apparently, q^+Lo ^) corresponds to a periodic preprocess- 
ing system with period 2 (T + Lq)- 

Suppose without loss of generality that the indices of the 
sample times that fall in [-T-Lq, T+Lq] are 0, 1, • • • , K-1, 
i.e. {k\tke[-T-Lo,T + Lo]} = {0,1, ••• ,K-1}. We 
can then set the sampling set A^^^^ of the periodized system 



such that for any sampling time tk E A^ , ^ , we have 



tk — tk mod K + 2(T + Lq) 



(13) 



where [x\ = max{n \ n E Z,n < x}. Clearly, this forms a 
periodic sampling set with period 2 {T + Lq) . The definition 
of Beurling density ensures that for any e > 0, there exists a 
Td such that for every T > To, 



fs-e<D{A^. 



T+Lo) <fs+e. 



We note that our construction ( [T2) l guarantees that the input 
x{t) within time interval [2k{T + Lq) ~ T, 2k{T + Lq) + T] 
will only affect the sampled output at the kth time block 
[{2k - 1) (T + Lo), (2fc + 1) (T + Lo)], as illustrated in Fig. 
[5] Since the noise ?](<) is assumed to be white, the noise 
components across different time blocks are independent. In 
fact, the intervals [2k{T + Lq) + {2k + l)(r + Lq) - T] 
(fc E Z) act effectively as guard zones in order to avoid leakage 
signals across different time blocks. 



x{t) 



1,' 



Guard Zone Codeword 2 

J 



-T-L„ -T 



T r + £„r + 2L„ 



3r + 2Lo 3r + 3Lo 



y(^) Observation Interval 1 



Observation Interval 2 



-T-Lo -T 



T T + Lo r+2L„ 



3r + 2Lo 3r + 3Lo 



Figure 5. The code words of duration 2T are separated by guard 
zones of duration 2Lo. There is no intersymbol interference among different 
observation intervals. 



Using the above argument, we can separate code words of 
duration 2T in [2fc(r+Lo)+T, {2k + l){T+Lo)-T] {k E 1) 
on the analog channel by a guard zone 2Lq (as illustrated 
in Fig. [5]l. The ratio of guard space to the length of the 
time block vanishes as T — > oo. Additionally, there is no 
intersymbol interference under the new system we construct. 
By our capacity definition, for any 6 > i), there exists a Tq 
such that VT > Tq, we have 



T + Lq 
T 



< 1 + (5, and 



T 



T + Ln 



> 1 



One can then verify that 



C?{fs,P) < 



T 



-P 



<{l + 5)C;{D (AP.+^J,(1-5)P), 

where denotes the capacity under our periodized system. 
The inequality (i) follows from the following arguments: 
(a) CT^ {fs,P) is the information rate when we observe the 
samples within the interval [— T, T], which is smaller than the 
information rate (which we term C^) if we can observe all 
samples within [—T — Lq,T + Lq]; (b) is equivalent to 
the maximum information rate achievable by the periodized 
system, under the constraint that x{t) is over the guard 
zones. Clearly, this rate will be smaller than the capacity 
without this transmission constraint, which is ^^-^j^C^ . Here, 
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the multiplication factor arises from the fact that we 

only use a portion ^^^^ of time for transmission; (c) Corre- 
spondingly, since we allocate an amount of PT power over 
each transmission block and zero power over each guard zone, 
the average power allocated to the transmitted signal would be 
~ P. 



We know from Corollary |2] that 

{D {A?,^J , (1 - <5) P) < C„ {D {A'^^J ,il-S)P). 

By observing that Cn{fs,P) is monotonically non-decreasing 
in fs and P, we can derive 

C?{fs, P)<{1 + S) {D {A^^^J ,{1-S) P) 

<{l + S)C,ifs + e,P) (14) 

whenever T > max{To,rD}. Since e and S can be chosen 
arbitrarily small, we conclude that 

lim sup C?{fs,P)<C,{fs,P) 

T^oo 

holds for any V, which completes the proof when h{t) is of 
finite duration and r]{t) is white. 

2) g(t) — {f) / S.fi{t)j of finite duration but ri{t) 

colored: Now suppose instead that g{t) is of finite duration L. 
We can then split the channel filter H (f) into two parts with 
respective frequency response H (/) and yJ~S^^{J). 

Since the colored noise is equivalent to a white Gaussian noise 
passed through a filter with transfer function the 
original system can be redrawn as in Fig. |6] Since the filter 

Sriif) can be incorporated into the preprocessing system 
to generate a new time-preserving preprocessor, our results 
above immediately justifies the capacity upper bound Q for 
this class of channels. 



rjit) (white noise) 



x{t) 



HU) 



\/Snif) 



new preprocessor 



Figure 6. Equivalent representation of sampling systems in the presence of 
colored noise. 



B. Infinite-duration g{t) 

In this subsection, we investigate the capacity bound when 
g{t) = (^H{f) / yj Sri{fj \ is nonzero for infinite duration. 
We would like to prove that for any given sampUng system V 
and any e > 0, there exists Ti such that for any T > Ti , one 
has 

Our proof proceeds by comparing the original channel with 
a truncated channel whose channel response H{f) satisfies 



H{.f) 




if \t\>L,, 
otherwise. 



Let ^ > be an arbitrary small constant, and Li is chosen 
such that 

\g{t)fdt+ \g{t)\^dt<C (15) 

yo J Li 

We again split the analog channel into two parts as illus- 
trated in Fig. |6] and incorporate the filter into a 
new sampling system V. We further constrain the input and 
the observed sampled output to time [— T, T]. For both the 
original and the truncated channel, the sampled noise is not 
independent, which motivates us to perform prewhitening first. 

Suppose without loss of generality that the sampled times 
within [—T,T] are {ti \ I < i < Kt}- For convenience of 
notation, we introduce the operator Vt associated with the 
sampling system such that 

pT(r(i)) = [y[l],y[2],--- ,2/[i^T]], 

where f{t) ~ g{x) * x{t) + fi{t) is the sampling system 
input, is white, and are the corresponding sampled 

output. Thus, for the original channel, one can write 



v[Kt 



pTigit)*x{t)) + VTm)) 



Denote by q{ti,T) the impulse response associated with this 
sampling system. Then, the noise component Vt ivit)) can be 
whitened by left-multiplying it with a i^T-dimensional square 



matrix W, 



-1/2 



that satisfies 



q{U,T) q* {tj,T) dr, 



where the invertibility is guaranteed by our assumptions. To 
see this, if we let fj = W ^^"^Vt ivit)) denote the Kt- 
dimensional "prewhitened" noise, then one can verify that for 
every i and j: 

'PT{fl{t)){'PT{fl{t))Y^ 



E 



(7(t,,r) q* {tj,T) dr 



which immediately implies that 



E Vt 



Therefore 

E {f^f) = W-^i/^E (vt m)) {Pt (flit)) 



W 



-1/2 



Denote by Vv, = ■ Pt, and let q^, {ti,T) denote its 

associated impulse response. Then, we can observe that 



q«{U,T)ql{tj,T) dr 




(16) 



which implies that {q^ {ti, ■) ,1 < i < Kt} is an orthonormal 
sequence in the corresponding Hilbert space. 
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For an operator A with an impulse response a{t, t) (— T < 
T < T, t € {U \ I < i < Kt}) and input domain V (A), we 
denote by \\A\\p the Frobenius norm of the operator A with 
respect to its associated domain. Namely, 



l^ll 



if 2? (^) = {t, I 1 < z < Kt} X [-T,T], 



ifV{A)^{U\l<i< Kt} X [-oo, oo], 



-T I 

ifv{A) [-oo,cx)] X [-r,r]. 

This is an analog of matrix Frobenius norm in the corre- 



sponding Hilbert space. We observe from (16i that q„(ti,-) 



(1 < i < Kt) forms an orthonormal sequence, and hence 
by Bessel's inequality |48l, an operator A with V (A) = 

[— (X),cx)] X [—T, r] satisfies 

^|(g„(t,,.),«(-,r))|^< / |a(ri,r)|^dri 
for every r G [— T, T] . This immediately gives us 



T^'mA 



Kt 

E 

i=l 



T Kt 



(Jw(ii,Tl)a(Ti,T)dTi 



|(9w {U, ■) ,a{-,T))\^ dr 



1=1 

T /.oo 



< 



/ |a(ri,T)|'dTidr < Pllp. 

-T J-oo 



Suppose that {Ai} is the set of eigenvalues associated with 
the original sampled channel operator V^iG, and that {Ai} is 
the set of squared singular values associated with the operator 
VsfiG of the truncated sampled channel. We can obtain some 
properties of {Ai} and {A;} as stated in the following lemma. 

Lemma 1. Suppose that 15(^)1^ dt < Cg < oo for some 
constant Cg. For any ^ > 0, there exists Tq such that for every 
T > Tq, one has 



2T Si 2T ^ 



(1) 



■ dt < oo. 

(3) Suppose that g{t) = O {jt+t) for some small e > 0. 
Then there exists Tq.c such that for every T > Tq.^, one has 



\i — A, 



Proof: See Appendix |E] 
Define the following two functions 



and 



C?iiy,{K}) 



FTily,{^^}) 



1 '^^ 1 

— T- 

i=l 



Kt 

2T ^ 

1=1 



for some water level j/. Note that when Ft [v, {\}) — P, we 
have {v, {A;}) = (/s, P). By the definition of Beurling 



density, we can observe that Kt > 2T {fg — e) when T is 
sufficiently large. Therefore, 



Kt 



2T 

which immediately yields 

2T 



< 



Kt 
1 



Is 



2P 



where the last inequality uses the results in Lemma [T] This 
implies that ly is bounded. 

For any feasible ly, the function ^ [log (i^x)]^ and [v — x]^ 
have maximum slope ^ly and —1, respectively. We therefore 
assume that the slopes for both functions are bounded within 
[—B,B] for some constant B. Using the same water level ly, 
the necessary power can be given for both channels as 



P = 



Kt 

— T 

2T ^ 



A,; 



and P = 



Kt 

— Y 

2T ^ 



1 

V — — 
A,, 



Combining Lemma [T] and the condition that the slope of 
[i/ — a;]^ is bounded by B immediately yields that for any 
small ^ > 0, there exists Tq e such that for any T > Tq e, one 
has 



i=l 

<B{fs + e) t 



^BKt 
- 2T ^ 



(17) 



Similarly, we can derive 



C?(i/,{AJ)-C?(i/,{A,} 



Kt 

^^E^^i°g('^^or 

i=l 



log 



(;^A,) 



Kt 



< 



2T ^ 2 



A,; — A, 



^ B{fs + e^^_ 



(18) 
(19) 



Combining ([17]), ([19]) and ([T4ji leads to 



C?{fs,P)<C?(f,,P 



B ifs + e) 



< C?{f,,P + B{f, + e)0 
<^^e+(l + ^) 



B ifs + e) 



Since 5, e, and ^ can all be made arbitrarily small, we conclude 
that 

lim sup C?(/.,P)<Cu(/„P), 

T-i-oo 

which completes the proof. 
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Appendix B 
Proof of Theorem[3] 

For each singular point x in D, there exists an interval 
A {x, e) ^ {z\\x~ z\< e} such that A{x, e)nB^ = 0. Since 
the set of rational numbers is dense along the real line, there 
exists at least one rational number inside A{x,e). Since the 
rational set is countable, we can see that D is also countable, 
and hence 

^l{D) = 0. (20) 
Therefore, the water level in Q is equal to the solution of 



feu.Bi 



\H{f)\' 



d,f = P. 



The capacity Cu{fs,P) is then equal to 



feu.B, 



log 



|g(/)r 



df. 



The spectral components in Bi can be perfectly reconstructed 
from the sequence that is obtained by first extracting out a 
subinterval Bi and uniformly sampling it with sampling rate 
fs.i, which is commensurate to the analog capacity when 
constraining the transmit signal to UiBi. 

Appendix C 
Proof of Theorem|4] 

We provide here a capacity-achieving strategy using a single 
branch of sampling with modulation and filtering. 

Since the set of rational numbers is dense, there exists at 
least one distinct rational number inside each continuous inter- 
val Bi. Besides, since the set of rational numbers is countable, 
one can immediately see that {Bi \ i e X} is countable. Note 
that /i (D) = from the argument in ( [20) i, which means that 

A* 



{B^) = Y,^^{B.,) 



iex 



holds for some countable set X. Therefore, for any ^ > 0, 
there exists a finite index set I C A" of cardinality n such that 

iex 

Without loss of generality, let I = {1,2, ••• ,n}. For any 
e > 0, there exists an integer N and a real 6 > such that the 
following holds: if we define Bk = [kfs/N, {k + l)fs/N], 
then 

(i) for any Z e I = {1, • • • , n}, there exists a set of index 
sets {Z;} such that 

U Bfc C Bu and /X I U Bfc J < <5; 



(ii) 



Y.Abi\ y bA -iiogfi+Psup 

i=\ \ kei, / \ f 



< e. 



Condition (i) guarantees that the division can ap- 

proximate the original sets {Bi \ I < I < n} arbitrarily well. 



while (ii) ensures that the approximation error in (i) can only 
lead to a negligible gap between the approximate capacity and 
the upper bound. Denote by 

Bm^\j{Bk:keIiA<l<n} 

the approximate maximizing frequency set, which contains 



N^,(B^]/f, 



subbands each of bandwidth fs/N with an index set 2^. 
We further let Lmin and Xmax denote min|i:ie2jn| 

and max |i : i G respectively. The capacity-approaching 
strategy can now be designed using the following spirit: the 
pre-modulation filter extracts out several of the best subbands 
and suppresses all other subbands; the modulation opera- 
tion scrambles spectral components while ensuring that the 
components in each of the above subbands dominates one 
subband after scrambling; finally, the post-modulation filter 
suppresses out-of-band components to avoid aliasing. The 
detailed algorithm is given as follows. 

1) Let the pre-modulation filter be such that 



1, if f E Bk -Bm for some k; 
0, else. 



In other words, the pre-modulation filter suppresses all 
out-of-band signals and noise. 
2) Let the modulation sequence q{t) be periodic with 
period N/fs, and hence the Fourier transform of q{t) 
is an impulse train of the following form 

The coefficients c' of the optimizing modulation 
sequence and the frequency response of the post- 
modulation filter S{f) can be derived using 
Algorithm 1 of |39|, which we repeat below. 

Algorithm 1 



1 . Initialize. 

Hkf./N)\^ I 



(\Hik 



N largest elements in 
k e Denote hy {k \ 1 < i < N} 



■,{kfs/N) 

the index set of these 'N elements such that 

> In 



ll> l2> 

L* 
2. 



Set L = max{|£i| , |^jv|}- 



= min {fc I fc g Z, A: > L, /c mod iV = 0} 
For i = 1 : TV 
Let a :^ i ■ L* + i — li. 
Set c" = 1, and S{f + afp) = 1. 



Appendix D 
Proof of Theorem|5] 

In this section, the proof is restricted to the channel with 
white noise, i.e. = 1. In fact, this result extends to 

channels with colored noise without difficulty Suppose the 
additive noise is of power spectral density We can 







^This has similar spirit as Gallager's treatment of analog channel capacity 
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then split the channel filter H (/) into two parts with re- 
spective frequency response H (/) / yJSri{f ) and as 
illustrated in Fig. [6] Equivalently, the channel input is passed 
through an LTI filter with frequency response H (/) / 
contaminated by white noise, and then passed through a filter 
with transfer function ^ Sri{f) followed by the sampler. This 
equivalent representation immediately leads to the capacity in 
the presence of colored noise by substituting corresponding 
terms into the capacity with white noise. 

Our proof proceeds in the following three steps. 

1) We first introduce several correlation functions and 
calculate the Fourier series associated with them. These 
quantities are crucial for deriving the capacity. In partic- 
ular, when the sampling system is periodic, the infinite 
correlation matrices are block Toeplitz. 

2) When constrained to a finite interval [—nTq^nTq], the 
sampled output is a finite vector. The sampled noise is 
in general not white, which motivates us to whiten it 
first. In fact, the correlation matrix with respect to the 
sampled noise can be easily derived via the correlation 
function we introduce. 

3) For any interval [—nTq,nTq\, the capacity can be ob- 
tained through the Karhunen-Loeve expansion. Specif- 
ically, the capacity depends on the eigenvalues of the 
associated system operator, which can be related to 
the correlation functions. The asymptotic properties of 
block Toeplitz matrices guarantee the convergence when 
n ^ CO, which allow us to obtain in closed form the 
sampled channel capacity. 

A. Correlation functions and Fourier series 

For a concatenated linear system consisting of the chan- 
nel filter followed by the sampling system, we denote by 
s {to, ti) := h{T — ti) q {to, t) dr its system output seen 
at time to due to an impulse input at time t^. For notational con- 
venience, we define (r) := q {t^, t) as the sampling output 
response at time tk due to an impulse input to the sampling 
system at time r. Two output autocorrelation functions are 
defined as follows 



Ti-hq {tk,tl) 



A 



s{tk,T) s* (t;,r)dT 



and 



/oo 
q{tk,T)q* {ti,T) dr. 
-oo 



For notational simplicity, we use TZhq{k,l) (resp. TZq{k,l)) 
and TZiiq {tk,ti) (resp. TZq {tk,ti)) interchangeably. When the 
sampling system is periodic with period Tq, one can easily 

verify that (7?.h<j(fc, 0)fc / i^^q i^i^))k i infinite 
block Toeplitz matrices. 

The spectral properties associated with the system operators 
are captured by Fourier series matrices F^q, Fqq, and Fq. 
Specifically, Fhq is an /^Tg -dimensional square matrix such 
that: for all 1 < /c,i < /^T,, 

OO 

^Fhq)k,^ = XI '^^'1 {^k,t^+if^T,) exp (i27rZ/) , 



OO 
1 — — 00 



i+ifsT,) exp(j27r;/) . 



The matrix Fq (/) is has dimensions fqTg x oo and Fh (/) 
is an infinite diagonal square matrix such that for all I £ Z 

and 1 < fc < fqTs-. 

{Fq)^, {f)^Qk{f + lfq), 
{Fu)i, {f)^H{f + lfq), 

Where Qfe(/)^J-(g,(.))=-F(g(tfc,.)). 

The key properties of the above autocorrelation functions 
and Fourier series are summarized in the following lemma. 



Lemma 2. The Fourier series matrices satisfy: 

Fhq 

and 



F FhFtF* 

g-T h-l^ /i-T q, 



F F* 



Proof See Appendix |F] 



B. Noise whitening 

Denote by Qk{-) the sampling operator associated with the 
sample time tk such that Qk {x) — q {tk,T) x (r) dr. The 
correlation of noise components Qk {rj) at different times can 
be calculated as 

E{Qk {v) Q*i iv)) 

nOO 

E( / q{tk,Tk)r]{n)dTk q{ti,Ti)ri{Ti)dTi 



OO 

oo pOO 



— oo J — oo 
oo 



q{tk,Tk)q {ti,Ti) E (?7 {Tk) vin)) drfedr; 

q{tk,T)q{ti,T) dr, 

which immediately implies that Q {rf) = 
[■ ■ ■ , Qi {v) T Q2 {v) r ' is a Gaussian vector with 
covariance matrix TZq. 

We now constrain both the transmit interval and the obser- 
vation interval to [—nTq,nTq]. Let 

Vn = [y [-nfsTq + 1] , , • • • ,y [nfsTq - 1] J/ [nfsTq]f , 

where the sampled output sequence satisfies 

y[k] - Qk {h{t) * x{t)) + Qk {v{t)) . 

Introduce the 2nfsTq dimensional truncated autocorrelation 
matrices TZ^^ and TZq such that for all —nfgTq < k,l < 
nfgTq, we have 

('^hq)k,l ^ '^hq {tk,tl) , 

(7^^)^^ = 7^, {tk,ti). 

Clearly, the noise components of y„ is of covariance matrix 
TZq, which motivates to whiten it first. 

By left multiplying y„ with (TZq) ^ , we obtain a new 
input-output relation as 

yn[k] = Qk {h{t) * X (t)) + 77 [fc] , Vfc : |fc| < nf.Tq, 
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where {77 [fc]} are now i.i.d. Gaussian random variables each 
of unit variance. Denote by q{tk,T) the equivalent impulse 
response of this new system. The truncated output autocorre- 
lation function TZ^~ can be given as (7?-^')^, ^ — Tlq {tk,ti) — 
J^oo ^ (^'^ ' ^) 'i* ' ''') which can be derived as 



C. Capacity via asymptotic properties of block Toeplitz ma- 
trices 

While both TU^ and TZ"^^ are Toeplitz matrices, 7?.- is in 
general not a Toeplitz matrix. Utilizing asymptotic equivalence 
in Toeplitz matrix theory |49|, we can show that TZ"^ is 
asymptotically equivalent to a block-ToepUtz matrix generated 
by the Fourier series 

= {f,f;)-'^ f.f^fif; {f,f;)-K 

Therefore, the asymptotic spectral properties of a block- 
Toeplitz matrix (e.g. pT) ) state that for any nondecreasing 
continuous function g(t) with bounded slope, one has 



E 9 {Kin)) 



1 



2nT, 



i=l 



2nT„ 



E 3 {vpX^ dw. 

i=l 

(21) 



where we denote by the zth eigenvalue of 

(FgF;)'-' F^FhFlF;{F,Fl)^~' for notational 
simplicity. 

(1) The capacity of the sampled channel with an optimal 
water level i/p can now be calculated as 



c 



''(/-^)=ii^^ E iHi-P^dn))]^ (22) 



n-foo 2nTa ^-^ 2 



(23) 



where (i) is a consequence of (21 



The water level 1/ can be computed through the following 
parametric equation 

+ 

1 



lim 



1 



E 



A, 



df = P, 



which by ( [2T] i is asymptotically equivalent to 

f^T, r , -.+ 



1 



27rT„ 



qJ -IT 



E 



or 



/,/2 fsT, 



1 



doj = P, 



df = P. 



This completes the proof. 

(2) We consider now the scenario where equal power 
allocation is employed. Classical MIMO channel capacity 
results pO[ indicate that the optimal power allocation for the 
transmitter is to allocate equal amount of power in all transmit 
branches. It remains to see how much power is allocated to 
the branch associated with A^ (7?.-). 

In fact, if the transmitter knows the channel bandwidth, 
almost all power (except for negligible leakage due to finite- 
time approximation) will be allocated inside the channel band- 
width [0, T4^]. Therefore, by the Shannon-Nyquist sampling 
theorem, all transmit signals can be equivalently transformed 
to a delta train X^i^-oo ^«^(^ ~ i/W). Consider the input 
time block [—nTq,nTq], then there are equivalently 2nTqW 
transmit branches inside this time block. Since the total power 
is Plot = "^nTqP, the power allocated to each transmit branch 
is given by 



Pq = lim 



lim 2"^'^ 



?l->-oo 2nWTq n->oo 2nWTq 



p 
w' 



Therefore, the capacity can be written as 

^non-wf ifsi P) 



A- ^ Ellog(l + ^A.(7^.)) 



2TrT, 

/,/2 I 

-/,/2 2 



A, dcj 



iog( j+^(f,f;) -^FqF^ 

■FiF* (f„f: 



h^q K^q^q) 



df 



Appendix E 
Proof of Lemma[T] 

(1) Let Q and Q denote respectively the operators associated 
with g{t) and g{t). Then, the triangle inequality yields 





< 




+ 




F 




F 




< 


■P^g 


+ 

F 



Q-Q 



(0-0) 



and hence 



< 



g-g 



g~g 



(24) 



From ( 15 1 one can easily show that for any ^ > 0, there exists 



a To such that for every T > Tq, one has 

g~g 



< 



\ 



n-T 



2T 



git)\' dt < ^2Tt 



Additionally, suppose that \git)\ dt < Cg < 00. Then, 
we have 



< J2T 



\git)\' dt < ,/2TC~g 
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This together with p4| immediately gives us 



< 



2T^ + 4TJ£(Tg 



Similar to [3, Theorem 8.4.1], we can obtain that 



and Xi = 



Therefore, 



Similarly, 



v^g 



2T 



F 2r 



v^g 



(2) One can also bound the sum of eigenvalues as follows: 



-Va =- 

2r ^ * 2T 



v^g 



2 1 9 

< — II^^IIf 



|5(t)|'dt<(X3, 



which completes the proof. 

(3) If g{t) = O (jTT?); then one can further bound 



1 









;<2r( 







< 2T0 = O , 



Therefore, applying Weyl's Theorem pT| yields that 



oo JT 
1 



: 



Ai — A, 



< 



v^g ~ v^g 



< 



g-g 



= o 



1 

y2e 



Therefore, for any small ^ > 0, there exists a constant To.e 



such that for every T > To^e, one has 



A,; — Ai 



Appendix F 
Proof of Lemma[2] 

Simple manipulation yields 

/>oo 

'R-hq{tk,tl) = j s{tk,T)s* {tl,T)dT 



By the periodicity assumption of the sampUng system, we 
have 

T^hq {tk+af,Tg,tl+bf,Tg) 

q {tk + aTq, Tk) TZh {n ~Tk)q* {U + bTg, n) dr^dr; 



q {tk,Tk - aTg) Uh [ti - Tk) 

■ q* {ti + {b - a)Tq,Ti- aTq) drfedrj 

='Tlhq {tk,tl + (b-a)f^Tj ■ 

Observing that 

Ti-hq {tk, ti+lfsTj 

q {tk, Tk) TZh in -Tk)q* {t, + ITq, n) dTkdn 



qk (Tk) Uh {n + ITq ~ Tk) q* in) dTkdn 

- (Uh * qk * qi*) (ITq) , 

one can see that {Fhq)k i is simply the Fourier transform of 
the sampled sequence of TZh * qk* q^^* ■ The properties of the 
Fourier transform give 

:F {TZh * qk * q-*) if) = r {T^h) if) ■ Qk if) ■ Q* if) 
= \Hif)\'Qkif)-Q*if), 

where Qk{f) = ^ {(Ik)- This immediately yields 

oo 

{Fhq)k,^^ E Qkif + lf<l)\Hif + lfq)\' QUf + lfq) ■ 
1 — — 00 

This allows us to express Fhq as 

Fhq 



FqFhFlF*q. 



(25) 



Similarly, the equality Fqq — FqF* is then an immediate 
consequence of (25i by setting Fji — I. 



q itk,Tk) h {Tk - t) h* in - t) q* iti,n) dTkdndT 

qk {Tk) TZh in - Tk) qi in) drfedrj, 
where TZh is defined as 

TZh {n - Tk) ^ h (Tfe - r) h* in - r) dr 



= J hiTk-n+T)h* iT)dT 

= {h * h-*) iTk - n) . 

Here, for any function /(<), we use /^(t) to denote /(— t). 
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